Speeding Up Multi-class SVM Evaluation by PCA and Feature Selection

نویسندگان

  • Hansheng Lei
  • Venu Govindaraju
چکیده

Support Vector Machine (SVM) is the state-of-art learning machine that has been very fruitful not only in pattern recognition, but also in data mining areas, such as feature selection on microarray data, novelty detection, the scalability of algorithms, etc. SVM has been extensively and successfully applied in feature selection for genetic diagnosis. In this paper, we do the contrary,i.e., we use the fruits achieved in the applications of SVM in feature selection to improve SVM itself. By reducing redundant and non-discriminative features, the computational time of SVM is greatly saved and thus the evaluation speeds up. We propose combining Principal Component Analysis (PCA) and Recursive Feature Elimination (RFE) into multi-class SVM. We found that SVM is invariant under PCA transform, which qualifies PCA to be a desirable dimension reduction method for SVM. On the other hand, RFE is a suitable feature selection method for binary SVM. However, RFE requires many iterations and each iteration needs to train SVM once. This makes RFE infeasible for multi-class SVM if without PCA dimension reduction,especially when the training set is large. Therefore, combining PCA with RFE is necessary. Our experiments on the benchmark database MNIST and other commonly-used datasets show that PCA and RFE can speed up the evaluation of SVM by an order of 10 while maintaining comparable accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speeding Up Multi-class SVM Evaluation via Principle Component Analysis and Recursive Feature Elimination

Support Vector Machines (SVM) have been shown to yield state-of-the-art performance in many pattern analysis applications. Feature selection methods for SVMs are often used to reduce the complexity of learning and evaluation. In this article we propose to combine a standard method, Recursive Feature Elimination (RFE), with Principal Component Analysis (PCA) to produce a multi-class SVM framewor...

متن کامل

Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine

Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods.  In filter methods, features subsets are selected due to some measu...

متن کامل

Feature selection using genetic algorithm for classification of schizophrenia using fMRI data

In this paper we propose a new method for classification of subjects into schizophrenia and control groups using functional magnetic resonance imaging (fMRI) data. In the preprocessing step, the number of fMRI time points is reduced using principal component analysis (PCA). Then, independent component analysis (ICA) is used for further data analysis. It estimates independent components (ICs) of...

متن کامل

MULTI CLASS BRAIN TUMOR CLASSIFICATION OF MRI IMAGES USING HYBRID STRUCTURE DESCRIPTOR AND FUZZY LOGIC BASED RBF KERNEL SVM

Medical Image segmentation is to partition the image into a set of regions that are visually obvious and consistent with respect to some properties such as gray level, texture or color. Brain tumor classification is an imperative and difficult task in cancer radiotherapy. The objective of this research is to examine the use of pattern classification methods for distinguishing different types of...

متن کامل

Dimensionality Reduction for Using High-Order n-Grams in SVM-Based Phonotactic Language Recognition

SVM-based phonotactic language recognition is state-of-the-art technology. However, due to computational bounds, phonotactic information is usually limited to low-order phone n-grams (up to n = 3). In a previous work, we proposed a feature selection algorithm, based on n-gram frequencies, which allowed us work successfully with high-order n-grams on the NIST 2007 LRE database. In this work, we ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004